tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

نویسندگان

چکیده

The use of background knowledge is largely unexploited in text classification tasks. This paper explores word taxonomies as means for constructing new semantic features, which may improve the performance and robustness learned classifiers. We propose tax2vec, a parallel algorithm taxonomy-based demonstrate its on six short problems: prediction gender, personality type, age, news topics, drug side effects effectiveness. constructed combination with fast linear classifiers, tested against strong baselines such hierarchical attention neural networks, achieves comparable results documents. algorithm’s also few-shot learning setting, indicating that inclusion features can data-scarce situations. tax2vec capability to extract corpus-specific keywords demonstrated. Finally, we investigate space potential where observe similarity well known Zipf’s law.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Constructing Shallow Taxonomies from Social Annotations

Tagging in social media system has demonstrated to be a convenient way for users to annotate objects of interest.One reason behind its success obviously because tags can be chosen by users arbitrarily without any topic and specificity constraints. Although tags are free-from keywords, there are some evidences 1 suggesting that, for a particular object type, users tend to use “similar” tag sets....

متن کامل

Extracting Interpretable Features for Early Classification on Time Series

Early classification on time series data has been found highly useful in a few important applications, such as medical and health informatics, industry production management, safety and security management. While some classifiers have been proposed to achieve good earliness in classification, the interpretability of early classification remains largely an open problem. Without interpretable fea...

متن کامل

Towards Acquiring Case Indexing Taxonomies From Text

Taxonomic case-based reasoning is a conversational casebased reasoning methodology that employs feature subsumption taxonomies for incremental case retrieval. Although this approach has several benefits over standard retrieval approaches, methods for automatically acquiring these taxonomies from text documents do not exist, which limits its widespread implementation. To accelerate and simplify ...

متن کامل

Selecting Features for Ordinal Text Classification

We present four new feature selection methods for ordinal regression and test them against four different baselines on two large datasets of product reviews.

متن کامل

Boosting for Text Classification with Semantic Features

Current text classification systems typically use term stems for representing document content. Ontologies allow the usage of features on a higher semantic level than single words for text classification purposes. In this paper we propose such an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting, a successful machine learning tec...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computer Speech & Language

سال: 2021

ISSN: ['1095-8363', '0885-2308']

DOI: https://doi.org/10.1016/j.csl.2020.101104